feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Pouyanpi · 2025-12-02T16:42:22Z

Description

Detect user input language and return refusal messages in the same language when content safety rails block unsafe content. Supports 9 languages: English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, and Thai.

TODO:

add tests
complete and report benchmarking

Language Detection Benchmark Results

Datasets Used

Dataset	Description	Samples	Languages
papluca	language-identification	40,500	9 (all supported)
nemotron	NVIDIA Nemotron-Safety-Guard-Dataset-v3	336,283	8 (missing zh*)

Chinese samples in Nemotron are all REDACTED; Chinese coverage validated via papluca dataset.

Prompt Length Analysis (characters)

Dataset	Min	Max	Mean	P25	P50	P75	P90	P95	P99
papluca	2	3,657	129.8	50	97	162	258	351	627
nemotron	1	20,750	303.5	51	111	331	625	1,072	3,004

Note: fast-langdetect truncates input at 80 characters by default (max_input_length=80), so longer prompts are effectively evaluated on their first 80 chars.

Overall Accuracy comparison

Dataset	Samples	fast-langdetect	lingua	detect_language action
papluca	40,500	99.71%	99.79%	99.71%
nemotron	336,283	99.35%	99.46%	99.42%

Latency comparison (μs)

Dataset	fast-langdetect Avg	fast-langdetect P95	lingua Avg	lingua P95	Action Avg	Action P95
papluca	12.12	15.54	116.21	205.29	25.77	28.75
nemotron	11.53	15.50	162.59	377.92	26.25	28.71

Per Language Accuracy (fast-langdetect)

Language	papluca	nemotron
ar (Arabic)	98.87%	99.63%
de (German)	99.93%	99.39%
en (English)	100.00%	99.03%
es (Spanish)	100.00%	99.04%
fr (French)	99.98%	99.25%
hi (Hindi)	98.76%	99.60%
ja (Japanese)	100.00%	99.61%
th (Thai)	99.93%	99.29%
zh (Chinese)	99.93%	N/A

Per-Language Accuracy (lingua)

Language	papluca	nemotron
ar (Arabic)	99.84%	99.75%
de (German)	100.00%	99.55%
en (English)	99.93%	99.00%
es (Spanish)	99.98%	99.43%
fr (French)	99.82%	99.35%
hi (Hindi)	98.80%	99.81%
ja (Japanese)	100.00%	99.69%
th (Thai)	99.78%	99.12%
zh (Chinese)	99.93%	N/A

Why fast-langdetect?

https://github.com/LlmKira/fast-langdetect

MIT license and Creative Commons Attribution-Share-Alike License 3.0.
comparable accuracy: within 0.1-0.5% of lingua across all datasets (99.35% vs 99.46% on 336k samples)
10-14x faster: average latency ~12μs vs ~140μs
simpler integration: single lightweight dependency
no cold start issues: unlike lingua which requires model building
no dependency issue in future

Error analysis

Most errors occur with:

short text (single words): insufficient context for detection
mixed language content: text containing English within non-English context
similar language confusion: Spanish vs Galician, Hindi vs Marathi, Arabic vs Persian

The action correctly falls back to English (en) for unsupported detected languages.

Benchmark Scripts

checkout to temp/lang-detect-benchmark branch

Located in eval/language_detection/:
make sure to have datasets and pandas installed:

poetry run pip install pandas datasets

# run all benchmarks
poetry run python eval/language_detection/run_benchmarks.py

# Or run individually
poetry run python eval/language_detection/benchmark.py --dataset papluca --mode action --report eval/language_detection/reports/

…age support Detect user input language and return refusal messages in the same language when content safety rails block unsafe content. Supports 9 languages: English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, and Thai.

codecov · 2025-12-02T16:49:30Z

Codecov Report

❌ Patch coverage is 44.44444% with 25 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/library/content_safety/actions.py	35.89%	25 Missing ⚠️

📢 Thoughts on this report? Let us know!

fix

cparisien · 2025-12-03T14:25:34Z

nemoguardrails/library/content_safety/actions.py

+DEFAULT_REFUSAL_MESSAGES: Dict[str, str] = {
+    "en": "I'm sorry, I can't respond to that.",
+    "es": "Lo siento, no puedo responder a eso.",
+    "zh": "抱歉，我无法回应。",
+    "de": "Es tut mir leid, darauf kann ich nicht antworten.",
+    "fr": "Je suis désolé, je ne peux pas répondre à cela.",
+    "hi": "मुझे खेद है, मैं इसका जवाब नहीं दे सकता।",
+    "ja": "申し訳ありませんが、それには回答できません。",
+    "ar": "عذراً، لا أستطيع الرد على ذلك.",
+    "th": "ขออภัย ฉันไม่สามารถตอบได้",
+}


If we later had other multilingual rails, would we be repeating this mechanism in each rail? Or just the set of supported languages per rail? I don't think we need to do it now (since we don't have other multilingual rails to test it), but we should be aware of what refactoring would be needed to move the below language detection to a shared level.

Of course, we can relax this constraint later and allow users more flexibility. Once we need to support other models or other types of rails (beyond content safety) that require multilingual responses, we can:

Move the detect_language action from library/content_safety/actions.py to a shared location (nemoguardrails/actions/) making it available to all rails

It was possible to also introduce a Colang level abstraction like bot refuse to respond $multilang=true, could be done easily for Colang 2.0, but I think it is better if we don't add new Colang features for now.

I agree, for now, keeping it scoped to content safety keeps the implementation focused.

cparisien · 2025-12-03T14:40:42Z

nemoguardrails/library/content_safety/actions.py

+    try:
+        from fast_langdetect import detect
+
+        result = detect(text, k=1)


Does fast-langdetect ever return a full locale with dialect, like en-US versus en? I don't see it in the docs, but I do see some upper/lowercase inconsistency.

fair point, thanks for raising it. Just took a closer look: the fast-langdetect source code and fastText model behavior.

fast-langdetect README mentions BCP-47 tags like "zh-cn", "pt-br"

but the fastText lid.176.bin model uses simple ISO 639 codes: zh, pt, en, etc.

fast-langdetect source simply strips __label__ prefix from fastText output, no regional mapping is applied

validated with actual test:

>>> detect("抱歉，我无法处理该请求", k=2) [{'lang': 'zh', 'score': 0.80}, {'lang': 'ta', 'score': 0.08}]

returns "zh", NOT "zh-cn".

So no regional variant handling needed.

tgasser-nv · 2025-12-03T14:59:47Z

This looks really good @Pouyanpi ! I have a few comments:

Could you commit the evaluation scripts as well in the final PR (for reproducibility?)
What does the "Action" column in the latency report refer to? Is this the latency end-to-end when fast-langdetect embedded in a Guardrails action? It approximately doubles the mean and p95.
* Is it possible to customize refusal texts in Colang-only, or does it need a Python change? Just saw this is in the RailsConfig, that's perfect.
Could you calculate percentiles of prompt-length (ideally in tokens but characters is fine too) for each of the datasets?

Not needed in this PR, but I'm thinking of RAG prompts where we have LLM instructions, user query, and relevant context chunks are all in a flattened prompt. These prompts can be pretty long (up to 7k tokens in some cases). This isn't needed for this PR, but I would be interested in a follow-on where we sample part of a prompt before running classification on the sample (e.g. 200 chars). This would be an optional config field. Customers would then have a knob to trade off accuracy vs latency for language detection.

Pouyanpi · 2025-12-10T12:41:25Z

Could you commit the evaluation scripts as well in the final PR (for reproducibility?)

I've included them temp/lang-detect-benchmark branch to make review easier. If you find it easier I will do.
But we don't intend to merge those in develop, right?

What does the "Action" column in the latency report refer to? Is this the latency end-to-end when fast-langdetect embedded in a Guardrails action? It approximately doubles the mean and p95.

Yes

* Is it possible to customize refusal texts in Colang-only, or does it need a Python change? Just saw this is in the RailsConfig, that's perfect.

Yes, I would like to avoid adding colang level features as much as possible

Could you calculate percentiles of prompt-length (ideally in tokens but characters is fine too) for each of the datasets?

Done! updated the description.

Not needed in this PR, but I'm thinking of RAG prompts where we have LLM instructions, user query, and relevant context chunks are all in a flattened prompt. These prompts can be pretty long (up to 7k tokens in some cases). This isn't needed for this PR, but I would be interested in a follow-on where we sample part of a prompt before running classification on the sample (e.g. 200 chars). This would be an optional config field. Customers would then have a knob to trade off accuracy vs latency for language detection.

fast-langdetect already does the truncation by default but indeed we can give that flexibility to the users :

max_input_length=80 characters (configurable)

tgasser-nv · 2025-12-10T14:37:26Z

Could you commit the evaluation scripts as well in the final PR (for reproducibility?)

I've included them temp/lang-detect-benchmark branch to make review easier. If you find it easier I will do. But we don't intend to merge those in develop, right?

Why wouldn't we merge them into develop? It's best practice in ML to make any results reproducible, for which we need the input datasets and scripts. The datasets are public and linked above. I'd imagine we'll have to re-run evals for new languages as they're added to the content-safety and other models. So we'll run this script periodically.

What does the "Action" column in the latency report refer to? Is this the latency end-to-end when fast-langdetect embedded in a Guardrails action? It approximately doubles the mean and p95.

Yes

Was that measured at a concurrency of 1? Having a 100% overhead for each language inference is a lot higher than I'd expect. We don't need to fix it in this PR.

* Is it possible to customize refusal texts in Colang-only, or does it need a Python change? Just saw this is in the RailsConfig, that's perfect.

Yes, I would like to avoid adding colang level features as much as possible

+1

Could you calculate percentiles of prompt-length (ideally in tokens but characters is fine too) for each of the datasets?

Done! updated the description.

Could you check? I didn't see any length description.

Not needed in this PR, but I'm thinking of RAG prompts where we have LLM instructions, user query, and relevant context chunks are all in a flattened prompt. These prompts can be pretty long (up to 7k tokens in some cases). This isn't needed for this PR, but I would be interested in a follow-on where we sample part of a prompt before running classification on the sample (e.g. 200 chars). This would be an optional config field. Customers would then have a knob to trade off accuracy vs latency for language detection.

fast-langdetect already does the truncation by default but indeed we can give that flexibility to the users :

max_input_length=80 characters (configurable)

Could you add optional Pydantic fields for any of these values it makes sense to expose to users? Looking at the config I think normalize_input, max_input_length, and model are all fields users might care about

Pouyanpi added 2 commits December 3, 2025 12:29

add tests

8b18bb5

fix

fix

1a6ea08

cparisien reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Uh oh!

Pouyanpi commented Dec 2, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

cparisien Dec 3, 2025

Uh oh!

Pouyanpi Dec 10, 2025

Uh oh!

cparisien Dec 3, 2025

Uh oh!

Pouyanpi Dec 10, 2025

Uh oh!

tgasser-nv commented Dec 3, 2025 •

edited

Loading

Uh oh!

Pouyanpi commented Dec 10, 2025 •

edited

Loading

Uh oh!

tgasser-nv commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Are you sure you want to change the base?

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Uh oh!

Conversation

Pouyanpi commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

TODO:

Language Detection Benchmark Results

Datasets Used

Prompt Length Analysis (characters)

Overall Accuracy comparison

Latency comparison (μs)

Per Language Accuracy (fast-langdetect)

Per-Language Accuracy (lingua)

Error analysis

Benchmark Scripts

Uh oh!

codecov bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cparisien Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

cparisien Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

tgasser-nv commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pouyanpi commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgasser-nv commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Pouyanpi commented Dec 2, 2025 •

edited

Loading

codecov bot commented Dec 2, 2025 •

edited

Loading

tgasser-nv commented Dec 3, 2025 •

edited

Loading

Pouyanpi commented Dec 10, 2025 •

edited

Loading